Skip to content

Conversation

@zhyass
Copy link
Member

@zhyass zhyass commented Nov 28, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

This PR introduces Git-like branching and tagging for Databend's Fuse tables, inspired by Apache Iceberg. Users can create independent branches for development/testing and read-only tags for marking important snapshots.

1. Core Concepts

Branch

  • Writable: Supports INSERT, UPDATE, DELETE, and other write operations(not implemented in this PR)
  • Independent Evolution: Each branch maintains its own snapshot chain

Tag

  • Read-only: Does not support write operations, only used to mark important snapshots

2. SQL Syntax

2.1 Creating Branches or Tags

ALTER TABLE <database>.<table> CREATE BRANCH | TAG <name> 
[AT (
    SNAPSHOT => '<snapshot_id>' |
    TIMESTAMP => <timestamp> |
    STREAM => <stream_name> |
    OFFSET => <time_interval> |
    BRANCH => <branch_name> |
    TAG => <tag_name>
)]
[RETAIN <n> DAYS | SECONDS];

Parameters:

  • BRANCH | TAG: Specify whether to create a branch or tag
  • AT: Specify the point in time to base the creation on (optional, defaults to current snapshot)
    • SNAPSHOT: Based on a specific snapshot ID
    • TIMESTAMP: Based on a specific timestamp
    • STREAM: Based on the current position of a Stream
    • OFFSET: Based on a relative time offset
    • BRANCH: Based on the current state of another branch
    • TAG: Based on a tag
  • RETAIN: Set the retention period for the branch|tag (optional, defaults to none)

Examples:

-- Create a development branch based on current state
ALTER TABLE sales.orders CREATE BRANCH dev;

-- Create a test branch based on yesterday's data, retain for 7 days
ALTER TABLE sales.orders CREATE BRANCH test 
AT (TIMESTAMP => '2024-11-27 00:00:00') 
RETAIN 7 DAYS;

-- Create a tag for current state
ALTER TABLE sales.orders CREATE TAG v1;

-- Create a tag based on a specific snapshot
ALTER TABLE sales.orders CREATE TAG backup_before_migration
AT (SNAPSHOT => '9828b23f74664ff3806f44bbc1925ea5');

2.2 Dropping Branches or Tags

ALTER TABLE <database>.<table> DROP BRANCH | TAG <name>;

Note: Drop operations are irreversible. Use with caution.

Examples:

-- Drop development branch
ALTER TABLE sales.orders DROP BRANCH dev;

-- Drop tag
ALTER TABLE sales.orders DROP TAG v1;

2.3 Querying Branch Data

-- Query data from a specific branch (similar to Git's remote/branch syntax)
SELECT * FROM <database>.<table>/<branch_name>;

-- Query development branch
SELECT * FROM sales.orders/dev;

3. Data Structure Design

pub struct TableMeta {
    // ... other fields ...
    
    /// Stores all branch and tag references
    pub refs: BTreeMap<String, SnapshotRef>,
}

pub struct SnapshotRef {
    /// The unique id of the reference.
    pub id: u64,
    /// After this timestamp, the reference becomes inactive.
    pub expire_at: Option<DateTime<Utc>>,
    /// The type of the reference.
    pub typ: SnapshotRefType,
    /// The location of the snapshot that this reference points to.
    pub loc: String,
}

pub enum SnapshotRefType {
    Branch = 0,
    Tag = 1,
}

4. Storage Layout

<table_prefix>/
├── _ss/                          # Main branch snapshot directory
│   ├── <snapshot_file>
│   └── ...
├── _refs/                        # Branch and tag directory
│   ├── <id1>/                      # dev branch
│   │   ├── <snapshot_file>
│   │   └── ...
│   ├── <id2>/                     # test branch
│   │   └── ...
│   └── <id3>/                     # v1 tag
│       └── <snapshot_file>
├── _sg/                          # Segment files (shared)
└── _b/                           # Block files (shared)
  • Main branch snapshots stored in _ss/ directory
  • Branch and tag snapshots stored in _refs/<ref_id>/ directories
  • Segment and Block files are shared across all branches, saving storage space

5. Vacuum and GC Integration

NOTE: If a branch or tag has expired, it will be cleaned up during vacuum and purge.

5.1 Tag Processing

Tags are read-only with a single snapshot

  1. Read head snapshot: Read the tag's head snapshot
  2. Act as GC Root: Tag snapshot serves as one of the GC roots, protecting its segments and blocks from cleanup
  3. No cleanup needed: Tag itself doesn't need snapshot cleanup (read-only)

5.2 Branch Processing

Branches need snapshot cleanup

  1. Select GC Root:
  • RetentionPolicy applies ByTimePeriod first.
  • ByNumOfSnapshotsToKeep is only used when no snapshots are expired by time.
  • Rationale: For long-inactive branches, snapshot-count based retention may produce a very old GC root timestamp, which prevents effective cleanup.
  1. Collect snapshots_to_gc: Get the list of snapshots to be cleaned
  2. Protect segments/blocks: Even if gc_root cannot be obtained, use earliest snapshot as gc_root to protect data

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@zhyass zhyass requested a review from drmingdrmer as a code owner November 28, 2025 16:39
@zhyass zhyass marked this pull request as draft November 28, 2025 16:39
@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Nov 28, 2025
@zhyass zhyass changed the title feat: table branching and tagging feat: initial support for table branching and tagging Nov 28, 2025
@zhyass zhyass marked this pull request as ready for review December 1, 2025 01:48
@zhyass zhyass requested review from drmingdrmer and removed request for drmingdrmer December 2, 2025 06:21
Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drmingdrmer reviewed 35 of 143 files at r1.
Reviewable status: 35 of 143 files reviewed, all discussions resolved (waiting on @dantengsky and @SkyFan2002)

@zhyass zhyass force-pushed the feat_branch branch 3 times, most recently from 7cbe99c to 6bea34e Compare December 9, 2025 18:22
Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drmingdrmer reviewed 4 of 143 files at r1, 8 of 34 files at r2, all commit messages.
Reviewable status: 41 of 143 files reviewed, all discussions resolved (waiting on @dantengsky and @SkyFan2002)

Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drmingdrmer reviewed 3 of 11 files at r3, all commit messages.
Reviewable status: 43 of 149 files reviewed, all discussions resolved (waiting on @dantengsky and @SkyFan2002)

@zhyass zhyass marked this pull request as draft December 15, 2025 13:07
@zhyass

This comment was marked as outdated.

@zhyass zhyass force-pushed the feat_branch branch 5 times, most recently from e466b11 to f6c4f04 Compare December 19, 2025 16:27
@zhyass zhyass force-pushed the feat_branch branch 2 times, most recently from 3b8ba05 to 02c3171 Compare December 20, 2025 06:07
@zhyass zhyass marked this pull request as ready for review December 20, 2025 06:07
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature this PR introduces a new feature to the codebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants